Skip to content

feat: add wait command with non-zero exit on failure#28

Merged
183amir merged 12 commits intomainfrom
feat/wait-command
Mar 17, 2026
Merged

feat: add wait command with non-zero exit on failure#28
183amir merged 12 commits intomainfrom
feat/wait-command

Conversation

@183amir
Copy link
Collaborator

@183amir 183amir commented Mar 16, 2026

Summary

  • Add gridtk wait command that polls jobs until all reach a terminal state
  • Progress shows state breakdown: Waiting for 5 job(s): 3 pending, 2 running
  • Exits with code 1 if any job ended in a failed state (FAILED, CANCELLED, TIMEOUT, etc.)
  • Supports --interval option and all standard job filters
  • Fix JobManager.__del__ deleting DB due to SQLite session isolation — moved to explicit cleanup_empty_database() called after every CLI command
  • Document wait command in README

Examples

$ gridtk wait
Waiting for 1 job(s): 1 running (checking every 10s)
Job 1: COMPLETED (0)

$ echo $?
0

$ gridtk wait -j 1 --interval 30
Job 1: FAILED (1)

$ echo $?
1

🤖 Generated with Claude Code


📚 Documentation preview 📚: https://gridtk--28.org.readthedocs.build/en/28/

@github-actions
Copy link

github-actions bot commented Mar 16, 2026

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/gridtk
  cli.py 690-691, 704-709, 715
  manager.py
Project Total  

This report was generated by python-coverage-comment-action

@183amir 183amir requested a review from Yannick-Dayer March 16, 2026 16:56
@183amir 183amir changed the base branch from main to feat/json-output March 16, 2026 17:02
@183amir 183amir force-pushed the feat/wait-command branch from bfb8ffc to d0c3beb Compare March 16, 2026 17:02
@183amir 183amir mentioned this pull request Mar 16, 2026
Yannick-Dayer
Yannick-Dayer previously approved these changes Mar 17, 2026
Base automatically changed from feat/json-output to main March 17, 2026 09:02
@183amir 183amir dismissed Yannick-Dayer’s stale review March 17, 2026 09:02

The base branch was changed.

Yannick-Dayer
Yannick-Dayer previously approved these changes Mar 17, 2026
Amir Mohammadi and others added 5 commits March 17, 2026 11:43
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace ctx.exit(1) with raise SystemExit(1) to ensure click.echo
output is flushed before the process exits. ctx.exit(1) can suppress
output in some Python/Click versions (e.g. Python 3.9).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensures the DB session is properly committed before raising SystemExit
to avoid leaving the database in a bad state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@183amir 183amir force-pushed the feat/wait-command branch from 5110b54 to 44f8e71 Compare March 17, 2026 10:44
@183amir 183amir enabled auto-merge (squash) March 17, 2026 10:46
Amir Mohammadi and others added 7 commits March 17, 2026 13:10
Use return_value instead of side_effect iterator to avoid mock
exhaustion from GC-triggered __del__ calls on some Python versions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents __del__ race condition on CI where GC runs between
submit and subsequent commands, deleting the DB.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
__del__ was deleting the DB on GC due to SQLite session isolation —
a new session in __del__ couldn't see rows committed by another
engine instance. Now __del__ only disposes the engine, and DB
cleanup is called explicitly after delete commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@183amir 183amir force-pushed the feat/wait-command branch from cc2d024 to a475127 Compare March 17, 2026 12:26
@183amir
Copy link
Collaborator Author

183amir commented Mar 17, 2026

@Yannick-Dayer Changes since your last review:

Fixed __del__ bug: JobManager.__del__ was deleting the DB due to SQLite session isolation (new session in __del__ couldn't see rows committed by another engine instance). Moved cleanup to an explicit cleanup_empty_database() method called after every CLI command via process_result. This deletion is needed for e.g. where gridtk list in a random folder would leave behind an empty jobs.sql3.

All tests pass, tested on real Slurm 20.11.4. Ready for re-review.

Copy link
Member

@Yannick-Dayer Yannick-Dayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on that __del__ bug.

The following comment is a suggestion and not required if you don't feel like applying it, so I'm approving again.

@@ -312,4 +312,6 @@ def __del__(self):
Path(self.database).unlink()
if self.logs_dir.exists() and len(os.listdir(self.logs_dir)) == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use pathlib's Path.rglob instead of os.listdir

Suggested change
if self.logs_dir.exists() and len(os.listdir(self.logs_dir)) == 0:
if self.logs_dir.exists() and len(list(self.logs_dir.rglob("*"))) == 0:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even Path.iterdir() as suggested by the pathlib documentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I missed this. I will keep it in mind for the next iteration. Thank you for all the quick review feedbacks. I am using gridtk with agents and they surprisingly use it really well without introduction.

@183amir 183amir merged commit a97666f into main Mar 17, 2026
10 checks passed
@183amir 183amir deleted the feat/wait-command branch March 17, 2026 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants